Mini Project Visualisation


Sanrio is a major Japanese entertainment company, with a market cap of £6.42 billion. Founded in 1960 under Shintaro Tsuji, the company has created over 450 characters including the global icon Hello Kitty. Beyond character design, the company’s influence extends through the production of theme parks, merchandising, animated films, video games, and collaborations with high-profile brands such as Starbucks, Adidas, and Swarovski. To engage its’ international fanbase, Sanrio launched the Sanrio Character Ranking event in 1986, originally published in a Japanese newspaper and now conducted online. This attracts widespread participation ( 65 million votes in 2025 ) and generates rich data that reveals trends of fan preferences across countries and regions.

My aim for this visualisation is to:

# Load in libraries
library(ggalluvial)
library(tidyverse)
library(ggplot2)
library(plotly)
library(readxl)
library(dplyr)

Data Origins

The data was sourced online through a public GitHub repository created by user FannyCheung, that contained 3 sheets - total ranking by year, oversea ranking and regional ranking. The original source has since been removed after work on this project, and is no longer online. To keep the project reproducible, a copy of the extracted dataset is included within the github repository. The data was originally created utilising data from the official sanrio website.

#Loading the data in
sanriocharacterranking <- read_excel("../data/sanriocharacterranking.xlsx", 
                                     sheet = 3)
# Quick check of raw data
glimpse(sanriocharacterranking)
## Rows: 573
## Columns: 4
## $ year             <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,…
## $ character        <chr> "Gudetama", "Yoshikitty", "Kuromi", "Gudetama", "Pomp…
## $ ranking          <dbl> 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3,…
## $ `country/region` <chr> "South Korea", "South Korea", "South Korea", "Taiwan"…

Data Cleaning

The data contains 3 sheets. I am interested in the regional rankings, as it contains the year, country, character and rank. I filtered out countries aside from South Korea, China, Taiwan, USA and UK as I felt that these were accurate representations of Sanrio’s fanbase.

# Seperating only the countries I want to look into

countries <- c('United Kingdom', 'United States', 'Taiwan', 'China', 'South Korea')
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#Cleaning the data, changing the name of the 'country/region' variable and removing ranks higher than 3.

sanrio_clean <- sanriocharacterranking %>% 
  select(year, character, country = `country/region`, ranking) %>% 
  filter(  country %in% countries)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#Just checking it's all in order
glimpse(sanrio_clean)
## Rows: 216
## Columns: 4
## $ year      <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, …
## $ character <chr> "Gudetama", "Yoshikitty", "Kuromi", "Gudetama", "Pompompurin…
## $ country   <chr> "South Korea", "South Korea", "South Korea", "Taiwan", "Taiw…
## $ ranking   <dbl> 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, …

Initial (bad) visualisation

With this cleaned data, I first wanted to create an alluvial plot with seperate facets for each country. The idea here was to show my aims clearly, but it quickly became evident that this approach was not working - there were too many characters and colours and it was hard to follow what was happening ( as seen below ). On top of this, the data was presenting an issue. For most characters, there wasn’t much movement from year to year, and for most characters they were only in the rankings once and never re-appeared. This results in the plot being both overcrowded within the legend, and dull with movement - an example of a bad visualisation.

#Plotting an alluvial with the previous data. 
ggplot(
  data = sanrio_clean,
  aes(x = year, stratum = ranking, alluvium = interaction(character, country),
    y = 1, fill = character)) +
  geom_stratum(colour = NA) +
  geom_alluvium(alpha = 0.4) +
  facet_wrap(~ country) +
  labs(
    title = "Sanrio character ranking over time by Country",
    x = "Year",
    y = ""
  ) +
  theme(
    axis.title.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.line.y = element_blank()
  )

Data wrangling

After reflecting upon my data further, I realised that the clearest way to show my data would involve transforming it. Including all aspects of the data - the year, the specific ranking placement, the character name and the country, would not create a good visualisation. Instead, I had the idea of transforming the data into how many times each character had been within the ’ top 3’ rankings, separated by country. This would give each character more information to work with and show across years.

#Removing data from before 2020, and collecting information on how often each character was within the top 3 by year. Creating a new column for this information. 

sanrio_top3 <- sanrio_clean %>%
  filter(year > 2019, ranking < 4) %>% 
  group_by(year, character, country) %>%
  summarise(
    total_top3 = n(),
    ranking = paste(sort(ranking), collapse = ", "),
    .groups = "drop"
  )
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Quick view of the wrangled data.

glimpse(sanrio_top3)
## Rows: 90
## Columns: 5
## $ year       <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,…
## $ character  <chr> "Aggretsuko", "Cinnamoroll", "Cinnamoroll", "Cinnamoroll", …
## $ country    <chr> "United States", "China", "Taiwan", "United Kingdom", "Unit…
## $ total_top3 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ ranking    <chr> "3", "1", "2", "3", "2", "3", "1", "1", "1", "2", "3", "3",…

Final visualisation

Here, I created an interactive, animated bar chart using plotly. This helped my goals of:

  • Showcasing preference trends over time ( viewing character popularity year by year )

  • Showcasing preference by region ( each chart is split by country )

#Plotting an interactive, animated bar chart with plotly. With the new variable of total_top3 on the X axis, character name on the Y axis, coloured by country and framed by year. Adding ranking into the hover info for further detail.

plot_ly(sanrio_top3, 
        x = ~total_top3, 
        y = ~character, 
        color = ~country, 
        frame = ~year, 
        type = 'bar', 
        orientation = 'h', 
        text = ~paste("Character:", character,
                      "<br>Country:", country,
              "<br>Ranking:", ranking), 
        hoverinfo = 'text') %>%
  layout(title = "Top Characters per Year by Country",
         xaxis = list(title = "Number of times each character is in the top 3"),
         yaxis = list(title = ""),
         barmode = 'stack') %>%
  animation_opts(frame = 1000, transition = 500, redraw = TRUE) %>%
  animation_slider(currentvalue = list(prefix = "Year: "))

Final thoughts!

This plot helps to understand the data in a way that a static chart cannot. Due to the amount of elements in the data set,I had found in the bad visualisation that it would be hard to see patterns clearly with a simple plot, utilising plotly helped bring these patterns to light.

Altogether, these elements are what led to the creation of an interactive plot being the right choice.

If I were to do this project again, I would include more years or more rankings, to create a more complex visualisation. The data set was quite bare, and had some inconsistencies within that limited the years and rankings I could use. I would search for another data set, or create my own, to achieve this.

The final visualisation helped to showcase all of my aims. Character trends were evident, with characters such as Cinnamoroll being consistently voted in China, yet not as much in South Korea. Individual anomalies were also evident, for example U-SA-HA-NA appearing most in 2025 coincides with the release of Hello Kitty Island Adventure - a game in which the less recognised U-SA-HA-NA appears as a main character. Finally,temporal shifts were shown like Cinnamoroll’s increase in overall popularity between 2021 and 2024, with a slight decline in 2025 as more varied characters entered the mix.

Repository for this visualisation can be found here.